NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Wav2Gloss: Generating Interlinear Glossed Text from Speech

https://doi.org/10.18653/v1/2024.acl-long.34

He, Taiqi; Choi, Kwanghee; Tjuatja, Lindia; Robinson, Nathaniel; Shi, Jiatong; Watanabe, Shinji; Neubig, Graham; Mortensen, David; Levin, Lori (August 2024, Association for Computational Linguistics)

Full Text Available
Generalized glossing guidelines: An explicit, human- and machine-readable, item-and-process convention for morphological annotation

Mortensen, David R. (January 2023, Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology)

Interlinear glossing provides a vital type of morphosyntactic annotation, both for linguists and language revitalists, and numerous conventions exist for representing it formally and computationally. Some of these formats are human readable; others are machine readable. Some are easy to edit with general-purpose tools. Few represent non-concatentative processes like infixation, reduplication, mutation, truncation, and tonal overwriting in a consistent and formally rigorous way (on par with affixation). We propose an annotation conventionâ€”Generalized Glossing Guidelines (GGG) that combines all of these positive properties using an Item-and-Process (IP) framework. We describe the format, demonstrate its linguistic adequacy, and compare it with two other interlinear glossed text annotation schemes.
more » « less
Full Text Available
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

https://doi.org/10.18653/v1/2023.emnlp-main.614

Ahia, Orevaoghene; Kumar, Sachin; Gonen, Hila; Kasai, Jungo; Mortensen, David; Smith, Noah; Tsvetkov, Yulia (January 2023, Association for Computational Linguistics)

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The API vendors charge their users based on usage, more specifically on the number of {``}tokens{''} processed or generated by the underlying language models. What constitutes a token, however, is training data and model dependent with a large variance in the number of tokens required to convey the same information in different languages. In this work, we analyze the effect of this non-uniformity on the fairness of an API{'}s pricing policy across languages. We conduct a systematic analysis of the cost and utility of OpenAI{'}s language model API on multilingual benchmarks in 22 typologically diverse languages. We show evidence that speakers of a large number of the supported languages are overcharged while obtaining poorer results. These speakers tend to also come from regions where the APIs are less affordable, to begin with. Through these analyses, we aim to increase transparency around language model APIs{'} pricing policies and encourage the vendors to make them more equitable.
more » « less
Full Text Available
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing

https://doi.org/10.18653/v1/2023.sigmorphon-1.22

He, Taiqi; Tjuatja, Lindia; Robinson, Nathaniel; Watanabe, Shinji; Mortensen, David R.; Neubig, Graham; Levin, Lori (January 2023, Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology)

Full Text Available
Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation

https://doi.org/10.18653/v1/2023.sigmorphon-1.7

Mortensen, David R.; Gulsen, Ela; He, Taiqi; Robinson, Nathaniel; Amith, Jonathan; Tjuatja, Lindia; Levin, Lori (January 2023, Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology)

Full Text Available
Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks

https://doi.org/10.18653/v1/2021.eacl-main.204

Sun, Jimin; Ahn, Hwijeen; Park, Chan Young; Tsvetkov, Yulia; Mortensen, David R. (January 2021, The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL))

Full Text Available
Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Siminyu, Kathleen; Li, Xinjian; Anastasopoulos, Antonios; Mortensen, David R.; Marlo, Michael; Neubig, Graham (January 2021, 22nd Annual Conference of the International Speech Communication Association (InterSpeech 2021))
null (Ed.)
Full Text Available
Evaluating the Morphosyntactic Well-formedness of Generated Texts

https://doi.org/10.18653/v1/2021.emnlp-main.570

Pratapa, Adithya; Anastasopoulos, Antonios; Rijhwani, Shruti; Chaudhary, Aditi; Mortensen, David R.; Neubig, Graham; Tsvetkov, Yulia (January 2021, Evaluating the Morphosyntactic Well-formedness of Generated Texts)

Text generation systems are ubiquitous in natural language processing applications. However, evaluation of these systems remains a challenge, especially in multilingual settings. In this paper, we propose L’AMBRE – a metric to evaluate the morphosyntactic well-formedness of text using its dependency parse and morphosyntactic rules of the language. We present a way to automatically extract various rules governing morphosyntax directly from dependency treebanks. To tackle the noisy outputs from text generation systems, we propose a simple methodology to train robust parsers. We show the effectiveness of our metric on the task of machine translation through a diachronic study of systems translating into morphologically-rich languages.
more » « less
Full Text Available
Automatic Extraction of Rules Governing Morphological Agreement

https://doi.org/10.18653/v1/2020.emnlp-main.422

Chaudhary, Aditi; Anastasopoulos, Antonios; Pratapa, Adithya; Mortensen, David R.; Sheikh, Zaid; Tsvetkov, Yulia; Neubig, Graham (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))
null (Ed.)
Full Text Available
AlloVera: A Multilingual Allophone Database

Mortensen, David R.; Li, Xinjian; Littell, Patrick; Michaud, Alexis; Rijhwani, Shruti; Anastasopoulos, Antonios; Black, Alan W; Metze, Florian; Neubig, Graham (May 2020, Proceedings of The 12th Language Resources and Evaluation Conference)

Full Text Available

« Prev Next »

Search for: All records